Faster Depth-Adaptive Transformers

نویسندگان

چکیده

Depth-adaptive neural networks can dynamically adjust depths according to the hardness of input words, and thus improve efficiency. The main challenge is how measure such decide required (i.e., layers) conduct. Previous works generally build a halting unit whether computation should continue or stop at each layer. As there no specific supervision depth selection, may be under-optimized inaccurate, which results in suboptimal unstable performance when modeling sentences. In this paper, we get rid estimate advance, yields faster depth-adaptive model. Specifically, two approaches are proposed explicitly words corresponding adaptive depth, namely 1) mutual information (MI) based estimation 2) reconstruction loss estimation. We conduct experiments on text classification task with 24 datasets various sizes domains. Results confirm that our speed up vanilla Transformer (up 7x) while preserving high accuracy. Moreover, efficiency robustness significantly improved compared other approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Tree-Depth Faster Than 2 n

A connected graph has tree-depth at most k if it is a subgraph of the closure of a rooted tree whose height is at most k. We give an algorithm which for a given n-vertex graph G, in time O(1.9602) computes the tree-depth of G. Our algorithm is based on combinatorial results revealing the structure of minimal rooted trees whose closures contain G.

متن کامل

Monad Transformers as Monoid Transformers

The incremental approach to modular monadic semantics constructs complex monads by using monad transformers to add computational features to a preexisting monad. A complication of this approach is that the operations associated to the pre-existing monad need to be lifted to the new monad. In a companion paper by Jaskelioff, the lifting problem has been addressed in the setting of system Fω. Her...

متن کامل

Faster Coordinate Descent via Adaptive Importance Sampling

Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically character...

متن کامل

Faster Adaptive Set Intersections for Text Searching

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corrob...

متن کامل

Faster Ray Tracing Using Adaptive Grids

fficient ray tracing has been a contradiction in terms since its introduction as a computer version of ideas found in Dürer's Underweysung der Messung (1525) and Descartes' La Dioptrique (1637). The most effective acceleration techniques developed to reduce ray tracing's high computational cost are based on space coherence: bounding box hierarchies and space subdivision. 1 During pre-processing...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i15.17584